AITopics | extractive summarization

Collaborating Authors

extractive summarization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Document Summarization with Conformal Importance Guarantees

Neural Information Processing SystemsJun-17-2026, 17:46:13 GMT

Automatic summarization systems have advanced rapidly with large language models (LLMs), yet they still lack reliable guarantees on inclusion of critical content in high-stakes domains like healthcare, law, and finance. In this work, we introduce Conformal Importance Summarization, the first framework for importance-preserving summary generation which uses conformal prediction to provide rigorous, distribution-free coverage guarantees. By calibrating thresholds on sentence-level importance scores, we enable extractive document summarization with user-specified coverage and recall rates over critical content. Our method is model-agnostic, requires only a small calibration set, and seamlessly integrates with existing black-box LLMs. Experiments on established summarization benchmarks demonstrate that Conformal Importance Summarization achieves the theoretically assured information coverage rate. Our work suggests that Conformal Importance Summarization can be combined with existing techniques to achieve reliable, controllable automatic summarization, paving the way for safer deployment of AI summarization tools in critical applications.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic Question & Answer Generation Using Generative Large Language Model (LLM)

Ehsan, Md. Alvee, Hasan, A. S. M Mehedi, Shahnoor, Kefaya Benta, Tasneem, Syeda Sumaiya

arXiv.org Artificial IntelligenceSep-30-2025

In the realm of education, student evaluation holds equal significance to imparting knowledge. To be evaluated, students usually need to go through text-based academic assessment methods. Instructors need to make a diverse set of questions that need to be fair for all students to prove their adequacy over a particular topic. This can prove to be quite challenging as they may need to manually go through several different lecture materials. Our objective is to make this whole process much easier by implementing Automatic Question Answer Generation(AQAG), using a fine-tuned generative LLM. For tailoring the instructor's preferred question style (MCQ, conceptual, or factual questions), Prompt Engineering (PE) is being utilized. In this research, we propose to leverage unsupervised learning methods in NLP, primarily focusing on the English language. This approach empowers the base Meta-Llama 2-7B model to integrate the RACE dataset as training data for the fine-tuning process. Creating a customized model that will offer efficient solutions for educators, instructors, and individuals engaged in text-based evaluations. A reliable and efficient tool for generating questions and answers can free up valuable time and resources, thus streamlining their evaluation processes.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.19475

Genre: Research Report > New Finding (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Document Summarization with Conformal Importance Guarantees

Kuwahara, Bruce, Lin, Chen-Yuan, Huang, Xiao Shi, Leung, Kin Kwan, Yapeter, Jullian Arta, Stanevich, Ilya, Perez, Felipe, Cresswell, Jesse C.

arXiv.org Artificial IntelligenceSep-26-2025

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.20461

Country: North America > Canada (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Extractive Text Summarization for Online News Articles Using Machine Learning

Biswas, Sajib, Biswas, Milon, Mandal, Arunima, Liza, Fatema Tabassum, Sarker, Joy

arXiv.org Artificial IntelligenceSep-22-2025

In the age of information overload, content management for online news articles relies on efficient summarization to enhance accessibility and user engagement. This article addresses the challenge of extractive text summarization by employing advanced machine learning techniques to generate concise and coherent summaries while preserving the original meaning. Using the Cornell Newsroom dataset, comprising 1.3 million article-summary pairs, we developed a pipeline leveraging BERT embeddings to transform textual data into numerical representations. By framing the task as a binary classification problem, we explored various models, including logistic regression, feed-forward neural networks, and long short-term memory (LSTM) networks. Our findings demonstrate that LSTM networks, with their ability to capture sequential dependencies, outperform baseline methods like Lede-3 and simpler models in F1 score and ROUGE-1 metrics. This study underscores the potential of automated summarization in improving content management systems for online news platforms, enabling more efficient content organization and enhanced user experiences.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.15614

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Cascaded Architecture for Extractive Summarization of Multimedia Content via Audio-to-Text Alignment

Hossain, Tanzir, Islam, Ar-Rafi, Hossain, Md. Sabbir, Rasel, Annajiat Alim

arXiv.org Artificial IntelligenceApr-10-2025

This study presents a cascaded architecture for extractive summarization of multimedia content via audio-to-text alignment. The proposed framework addresses the challenge of extracting key insights from multimedia sources like YouTube videos. It integrates audio-to-text conversion using Microsoft Azure Speech with advanced extractive summarization models, including Whisper, Pegasus, and Facebook BART XSum. The system employs tools such as Pytube, Pydub, and SpeechRecognition for content retrieval, audio extraction, and transcription. Linguistic analysis is enhanced through named entity recognition and semantic role labeling. Evaluation using ROUGE and F1 scores demonstrates that the cascaded architecture outperforms conventional summarization methods, despite challenges like transcription errors. Future improvements may include model fine-tuning and real-time processing. This study contributes to multimedia summarization by improving information retrieval, accessibility, and user experience.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.06275

Country: Asia > Bangladesh (0.15)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

OrderSum: Semantic Sentence Ordering for Extractive Summarization

Kwon, Taewan, Lee, Sangyong

arXiv.org Artificial IntelligenceFeb-22-2025

The sentence-level framework defines extractive summarization as an individual sentence selection problem, determining whether each sentence in a document should be included in the summary. However, the sentence-level framework often produces summaries that contain only general sentences or repeat important but similar sentences (Narayan et al., 2018b; Zhong et al., 2020). The summary-level framework overcomes this limitation by defining extractive summarization as a summary ranking problem rather than a sentence selection problem. The main idea of the summary-level framework is to generate a set of candidate summaries consisting of different sentences, and then rank them to select the best summary. By considering sentence composition at the entire summary level rather than sentence by sentence, this approach enables each sentence in the summary to convey different, specific information (Narayan et al., 2018b; Zhong et al., 2020). Previous work in both frameworks has primarily focused on improving which sentences to include in the summary, or in other words, sentence inclusion. However, to the best of our knowledge, the importance of sentence order in summaries has not been highlighted since the era of graph-based extractive summarization (Mihalcea and Ta-rau, 2004; Erkan and Radev, 2004). The sentence order of a text plays a crucial role not only in readability but also in its meaning (Yin et al., 2019; Lo-geswaran et al., 2018). Table 1 illustrates how the arXiv:2502.16180v1

candidate summary, sentence order, summarization, (15 more...)

arXiv.org Artificial Intelligence

2502.1618

Country:

Europe > Greece (0.28)
North America > United States > Colorado > Denver County > Denver (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(20 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > Europe Government (0.67)
Leisure & Entertainment > Sports > Cricket (0.46)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

State Space Models for Extractive Summarization in Low Resource Scenarios

Khayi, Nisrine Ait

arXiv.org Artificial IntelligenceJan-24-2025

Extractive summarization involves selecting the most relevant sentences from a text. Recently, researchers have focused on advancing methods to improve state-of-the-art results in low-resource settings. Motivated by these advancements, we propose the MPoincareSum method. This method applies the Mamba state space model to generate the semantics of reviews and sentences, which are then concatenated. A Poincare compression is used to select the most meaningful features, followed by the application of a linear layer to predict sentence relevance based on the corresponding review. Finally, we paraphrase the relevant sentences to create the final summary. To evaluate the effectiveness of MPoincareSum, we conducted extensive experiments using the Amazon review dataset. The performance of the method was assessed using ROUGE scores. The experimental results demonstrate that MPoincareSum outperforms several existing approaches in the literature

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.14673

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > Mexico (0.04)
(3 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application

Chen, Keyu, Fei, Cheng, Bi, Ziqian, Liu, Junyu, Peng, Benji, Zhang, Sen, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Yin, Caitlyn Heqi, Zhang, Yichao, Feng, Pohsun, Wen, Yizhu, Wang, Tianyang, Li, Ming, Ren, Jintao, Niu, Qian, Chen, Silin, Hsieh, Weiche, Yan, Lawrence K. Q., Liang, Chia Xin, Xu, Han, Tseng, Hong-Ming, Song, Xinyuan, Liu, Ming

arXiv.org Artificial IntelligenceDec-17-2024

With a focus on natural language processing (NLP) and the role of large language models (LLMs), we explore the intersection of machine learning, deep learning, and artificial intelligence. As artificial intelligence continues to revolutionize fields from healthcare to finance, NLP techniques such as tokenization, text classification, and entity recognition are essential for processing and understanding human language. This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models. Additionally, it highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness. By addressing key aspects of data processing and model fine-tuning, this work aims to provide insights into deploying effective and ethically sound AI solutions.

information retrieval, large language model, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2411.05026

Country:

North America > United States (1.00)
Asia (1.00)

Genre:

Workflow (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.67)
(2 more...)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CHIMA: Headline-Guided Extractive Summarization for Thai News Articles

Kositcharoensuk, Pimpitchaya, Sritrakool, Nakarin, Pratanwanich, Ploy N.

arXiv.org Artificial IntelligenceDec-2-2024

Text summarization is a process of condensing lengthy texts while preserving their essential information. Previous studies have predominantly focused on high-resource languages, while low-resource languages like Thai have received less attention. Furthermore, earlier extractive summarization models for Thai texts have primarily relied on the article's body, without considering the headline. This omission can result in the exclusion of key sentences from the summary. To address these limitations, we propose CHIMA, an extractive summarization model that incorporates the contextual information of the headline for Thai news articles. Our model utilizes a pre-trained language model to capture complex language semantics and assigns a probability to each sentence to be included in the summary. By leveraging the headline to guide sentence selection, CHIMA enhances the model's ability to recover important sentences and discount irrelevant ones. Additionally, we introduce two strategies for aggregating headline-body similarities, simple average and harmonic mean, providing flexibility in sentence selection to accommodate varying writing styles. Experiments on publicly available Thai news datasets demonstrate that CHIMA outperforms baseline models across ROUGE, BLEU, and F1 scores. These results highlight the effectiveness of incorporating the headline-body similarities as model guidance. The results also indicate an enhancement in the model's ability to recall critical sentences, even those scattered throughout the middle or end of the article. With this potential, headline-guided extractive summarization offers a promising approach to improve the quality and relevance of summaries for Thai news articles.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2412.01624

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

SlideSpawn: An Automatic Slides Generation System for Research Publications

Kumar, Keshav, Chowdary, Ravindranath

arXiv.org Artificial IntelligenceNov-20-2024

Research papers are well structured documents. They have text, figures, equations, tables etc., to covey their ideas and findings. They are divided into sections like Introduction, Model, Experiments etc., which deal with different aspects of research. Characteristics like these set research papers apart from ordinary documents and allows us to significantly improve their summarization. In this paper, we propose a novel system, SlideSpwan, that takes PDF of a research document as an input and generates a quality presentation providing it's summary in a visual and concise fashion. The system first converts the PDF of the paper to an XML document that has the structural information about various elements. Then a machine learning model, trained on PS5K dataset and Aminer 9.5K Insights dataset (that we introduce), is used to predict salience of each sentence in the paper. Sentences for slides are selected using ILP and clustered based on their similarity with each cluster being given a suitable title. Finally a slide is generated by placing any graphical element referenced in the selected sentences next to them. Experiments on a test set of 650 pairs of papers and slides demonstrate that our system generates presentations with better quality.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.17719

Country:

North America > United States > Florida > Miami-Dade County > Miami > Coconut Grove (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback